Cryptogram Decoding for Optical Character Recognition

نویسندگان

  • Gary Huang
  • Erik Learned-Miller
  • Andrew McCallum
چکیده

Optical character recognition (OCR) systems for machine-printed documents typically require large numbers of font styles and character models to work well. When given a document printed in an unseen font, the performance of those systems degrade even in the absence of noise. In this paper, we perform OCR in an unsupervised fashion without using any character models by using a cryptogram decoding algorithm. We present results on real and artificial OCR data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Chinese character decoding: a semantic bias?

The effects of semantic and phonetic radicals on Chinese character decoding were examined. Our results suggest that semantic and phonetic radicals are each available for access when a corresponding task emphasizes one or the other kind of radical. But in a more neutral lexical recognition task, the semantic radical is more informative. Semantic radicals that correctly pertain to character meani...

متن کامل

Task-specific minimum Bayes-risk decoding using learned edit distance

This paper extends the minimum Bayes-risk framework to incorporate a loss function specific to the task and the ASR system. The errors are modeled as a noisy channel and the parameters are learned from the data. The resulting loss function is used in the risk criterion for decoding. Experiments on a large vocabulary conversational speech recognition system demonstrate significant gains of about...

متن کامل

Research Report on Bangla OCR Training and Testing Methods

In this paper we present the training and recognition mechanism of a Hidden Markov Model (HMM) based multi-font Optical Character Recognition (OCR) system for Bengali character. In our approach, the central idea is to separate the HMM model for each segmented character or word. The system uses HTK toolkit for data preparation, model training and recognition. The Features of each trained charact...

متن کامل

Prosodic modeling of Mandarin speech and its application to lexical decoding

In this paper, a new RNN-based prosodic modeling method for Mandarin speech recognition is proposed. It is performed in the post-processing stage of the acoustic decoding aiming at detecting word boundaries for assisting in the lexical decoding. It employs a simple RNN to learn the relationship between input prosodic features, extracted from the input utterance with syllable boundaries provided...

متن کامل

2D Markovian modeling for character recognition and segmentation

Processing text components in multimedia contents remains a challenging issue for document indexing and retrieval. More specifically, handwritten characters processing is a very active field of pattern recognition. This paper describes an innovative two-dimensional approach for character recognition and segmentation. The method proposed combines Markovian modeling, efficient decoding algorithm ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006